Calling Patterns in Human Communication Dynamics 
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Modern technologies not only provide a variety of communication modes, e.g., texting, cellphone 
conversation, and online instant messaging, but they also provide detailed electronic traces of these 
communications between individuals. These electronic traces indicate that the interactions occur in 
temporal bursts. Here, we study the inter-call durations of the 100,000 most-active cellphone users 
of a Chinese mobile phone operator. We confirm that the inter-call durations follow a power-law 
distribution with an exponential cutoff at the population level but find differences when focusing 
on individual users. We apply statistical tests at the individual level and find that the inter-call 
durations follow a power- law distribution for only 3460 individuals (3.46%). The inter-call durations 
for the majority (73.34%) follow a Weibull distribution. We quantify individual users using three 
measures: out-degree, percentage of outgoing calls, and communication diversity. We find that the 
cellphone users with a power-law duration distribution fall into three anomalous clusters: robot- 
based callers, telecom frauds, and telephone sales. This information is of interest to both academics 
and practitioners, mobile telecom operator in particular. In contrast, the individual users with a 
Weibull duration distribution form the fourth cluster of ordinary cellphone users. We also discover 
more information about the calling patterns of these four clusters, e.g., the probability that a user 
will call the c r -th most contact and the probability distribution of burst sizes. Our findings may 
enable a more detailed analysis of the huge body of data contained in the logs of massive users. 



Understanding the temporal patterns of individual hu- 
man interactions is essential in managing information 
spreading and in tracking social contagion. Human in- 
teractions, e.g., cellphone conversations and emails, leave 
electronic traces that allow the tracking of human inter- 
actions from the perspective of either static complex net- 
works [ll-Q or human dynamics Q. Because static net- 
works only describe sequences of instantaneous interact- 
ing links, temporal networks in which the temporal pat- 
terns of interacting activities for each node are recorded 
have recently received a considerable amount of research 
interest Q . Investigations of inter-event intervals be- 
tween two consecutive interacting actions, such as email 
communications @, [HI , short- message correspondences 
pTH13j , cellphone conservations [lH llH , and letter cor- 
respondences [IB4i3i indicate that human interactions 
have non-Poissonian characteristics. Previous studies 
were conducted either on aggregate samples 
or on a small group of selected individuals @, 
[IH , but the communication behavior of individuals is not 
well understood. 

We study the complete voice information for cellphone 
users supplied by a Chinese cellphone operator and study 
the inter-event time between two consecutive outgoing 
calls (inter-call duration). Our studies are performed at 
both the individual and group levels. To ensure bet- 
ter statistics, the top 100,000 cellphone users with the 
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largest number of outgoing calls are chosen as our data 
sample, each having more than 997 outgoing calls. We 
propose a bottom-up approach to investigate individual 
cellphone communication dynamics: (1) finding the func- 
tional form of the distribution of each individual's inter- 
call durations, (2) grouping individuals with the same 
distribution, and (3) understanding the calling patterns 
for each group. We apply an automatic fitting technology 
to each mobile phone user, and filter out two groups of 
users according to their inter-call duration distributions. 
One group is comprised of individuals with a power-law 
duration distribution (3,464 individuals) and the other 
is comprised of individuals with a Weibull duration dis- 
tribution (73,339 individuals). We demonstrate that the 
two groups exhibit different calling patterns and that the 
individuals from the power-law group exhibit anomalous 
communication behaviors (e.g., the group includes indi- 
viduals sending spam). 

Results 

Distribution at the population level. There are 
5,921,696 different individuals in our data set (see the 
data description in Materials and Methods). For each 
individual, we estimate the intraday inter-call durations 
(d seconds) (sec definition of intraday inter-call durations 
in Materials and Methods) and we find that 4,635,536 
individuals have non-empty intraday durations (rid > 0), 
which we consider one unique sample when we investigate 
the distribution at the population level. To this end we 
also analyze the aggregate level where the data comprise 
only the durations of the top 100,000 individuals. 

Figure [T]A shows the empirical distributions of the two 
samples of aggregate data. Both curves exhibit excel- 
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lent power law behaviors in the range of [80,2000] sec- 
onds. We apply the least-square method, and find that 
a linear fit gives the power-law exponent 7 a u = 0.873 
for all the individuals and 7 top 10 s = 0.942 for the top 
100,000 individuals, respectively. We compare the em- 
pirical distributions obtained from our dataset with the 
empirical distributions of inter-call durations based on a 
different dataset provided by a European cellphone op- 
eratorjlj] (see also the supplementary information of 
Ref. [201), and note that the empirical distributions of 
both datasets share very similar patterns for d < 10 5 , 
where only intraday inter-call durations are taken into 
consideration. The reported power-law exponent 7 = 0.9 
in Ref. [2(| is approximately equal to the estimated ex- 
ponents 7aii shown in Fig. [T]4. A similar functional form 
with a power-law exponent 7 = 0.7 is also reported in 
Ref. [15| for inter-call durations smaller than 10 5 . This 
similarity is further consolidated by fitting the empirical 
duration distributions by means of a formula of power- 
law with an exponential cutoff. 
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FIG. 1. Probability distribution of the inter-call durations (d 
seconds). (^4) Distribution of inter-call durations at the pop- 
ulation level. The circle markers are shifted vertically by a 
factor of 0.1 for better visibility. (B) Plots of the statistic KS 
with respect to the truncated value d tr un for Weibull and ex- 
ponential distribution. (C) Plots of the power-law exponent 
7 with respect to the average number of outgoing calls (n ca n) 
for different groups. The red line stands for the power-law ex- 
ponent of the whole sample. (D) Probability distribution of 
mean inter-call durations for different samples of individuals. 

Figure [IJ4 shows a clear deviation from the power-law 
distribution in the tails of both curves, which is usu- 
ally interpreted as an exponential cut-off. To test which 
distribution better fits the data, we apply Kolmogorov- 
Smirnov (KS) statistics by means of which the smaller 
the value, the better the fit. We set the truncation value 
at 2000 seconds and find that for all individuals the tail 
is better fit by the Weibull distribution (KS = 0.016) 
than by the exponential distribution (KS = 0.063). Sim- 
ilarly, for the top 10 5 individuals the Weibull distribution 



(KS = 0.006) also fits the data better than the exponen- 
tial distribution (KS = 0.066). Figure Q]B for varying 
truncation values shows a plot of KS as a function of 
rftrun- The KS statistic displays a more stable behavior 
for the Weibull fit than for the exponential fit, indicating 
that the Weibull distribution is better able to capture the 
tail behavior than the exponential distribution. 

We further divide the sequence of individuals accord- 
ing to the number of outgoing calls into 46 groups, sorted 
in ascending order. The first group comprises 135,536 
individuals and the remaining 45 groups each comprise 
100,000 individuals. We calculate the empirical distribu- 
tions of the aggregate inter-call durations for each group 
and find that all the distributions share patterns simi- 
lar to those shown in Fig. [T]A. Figure Q]C shows a plot 
of the estimated power-law exponents with respect to 
the average number of outgoing calls. All the power- 
law exponents arc lower than 1 and the mean value is 
0.896 ±0.033. 

Figure [T]D shows the probability distributions of the 
individual average inter-call durations calculated for (i) 
all the individuals, (ii) the individuals with > 50, and 
(iii) the top 100,000 individuals, respectively. All three 
curves exhibit an approximate M-shape characterized by 
two peaks. For the sample of all individuals, there is a 
large number of low-frequency individuals who do not 
use a cellphone regularly. The influence of these low- 
frequency callers is eliminated in the distributional curve 
of rid > 50. We compare this distributional curve with 
the distribution of the top 100,000 individuals and find 
that they exhibit the same M-shape with a central val- 
ley at approximately d = 650, strongly indicating the 
presence of two groups of individuals possessing differ- 
ent calling patterns across the sample. One group is of 
individuals that have low average inter-call duration val- 
ues, indicating a high frequency of outgoing calls, and the 
other is of individuals that have large average inter-call 
duration values, indicating a relatively low frequency of 
outgoing calls. We will later demonstrate that the group 
with a high frequency of outgoing calls is dominated by 
individuals with a power-law duration distribution and 
that the group with a low frequency of outgoing calls is 
dominated by the individuals with a Weibull duration 
distribution. 

Classification of cellphone users. According to the 
above analysis at the aggregate level, we propose to clas- 
sify the individuals according to their duration distribu- 
tions. Motivated by Ref. [2l|, but here for each individual 
cellphone user, because we are focusing on the tail of the 
distribution we assume the candidate duration distribu- 
tions to be left-truncated and we assign each of them 
a distribution that is either power law or Weibull. We 
estimate the truncation value d m in associated with distri- 
bution parameters by finding the minimum KS statistic. 
We then apply statistical tests to check the significance of 
the fitting parameters (see fitting distributions and statis- 
tical tests in Materials and Methods). Finally, based on 
statistical tests, we find that there are 3,464 individuals 
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FIG. 2. Classified results of individuals with power-law distri- 
butions of intraday inter-call durations. (A) Probability dis- 
tributions of intraday durations for three randomly chosen in- 
dividuals. The markers of individual 28012863 and 52701654 
are shifted vertically by a factor of 10 -2 and 10 -4 for better 
visibility. The solid lines are the best maximum likelihood es- 
timation (MLE) fit to the power-law distributions, which gives 
the power-law exponents 7 = 2.15, 7 = 2.03, and 7 = 1.69 
for individual 2308772, 28012863, and 52701654, respectively. 
(B) Distribution of the estimated power-law exponents. (C) 
Probability distribution of collective inter-call durations by 
aggregating the durations of different individuals from the 
power-law group as one sample. The solid line is the best fit 
to the data by means of the least-square method, which gives 
an estimation of power-law exponent 7 = 1.69. (D) Prob- 
ability distribution of the mean inter-call durations for the 
power-law group. 



FIG. 3. Classified results of individuals with Weibull dis- 
tributions of intraday inter-call durations. (A) Probability 
distributions of intraday durations for three randomly chosen 
individuals. The markers of individual 5196860 and 6466665 
are shifted vertically by a factor of 10 -2 and 10~ 4 for better 
visibility. The solid lines are the best MLE fit to the Weibull 
distributions, which give the Weibull exponents /3 = 0.54, 
P = 0.64, and /3 = 0.51 for individual 3263120, 28012863, and 
6466665, respectively. (B) Distribution of the Weibull expo- 
nents and the solid curve stands for the fits to normal dis- 
tribution. (C) Probability distribution of collective inter-call 
durations by aggregating the durations of different individu- 
als from the Weibull group as one sample. The solid line is 
the best fit to the data by means of the least-square method, 
which gives an estimation of power-law exponent 7 = 0.882. 
(D) Probability distribution of the mean inter-call durations 
for the Weibull group. 



whose intraday durations follow a power-law distribution 
and 73,339 individuals whose intraday durations follow 
a Weibull distribution (see determining the distribution 
form in Materials and Methods). 

Figure [2]A shows that the empirical duration distri- 
butions for three randomly chosen individuals (2308772, 
28012863, and 52701654) whose intraday inter-call dura- 
tions follow a power-law distribution. The solid lines cor- 
respond to the power-law fits with power-law exponents 
7 = 2.15, 7 = 2.03, and 7 = 1.69 for individuals 2308772, 
28012863, and 52701654, respectively. Figure 0B plots 
the distribution of the estimated power-law exponents 7 
for all individuals with an intraday inter-call duration 
that follows a power-law distribution and finds that none 
of the power- law exponents are lower than 1.5. This is 
in sharp contrast to the power-law exponents lower than 
1 that we found for the aggregate durations in Fig. [T]C. 
Note that there is a large fraction of individuals whose 
power- law exponents are between 1 and 3, which are the 
characteristic values for the Levy regime (1,3). Note that 
the exponent 2 corresponds to the famous Zipf law. Hav- 
ing all the power-law exponents, we calculate the mean 
(7) = 2.00 ±0.32. 

We investigate the distribution of the aggregate intra- 



day inter-call durations by treating the individual dura- 
tions from the power-law group as one unique sample. To 
this end, for the aggregate data set in Fig. [2]C; we find 
a power law with exponent 7 = 1.69. We find another 
striking feature in the power-law tail: the Weibull shape 
disappears. Figure [2P plots the probability distribution 
of the mean of inter-call durations of the individuals in 
the power-law group, where the peak agrees well with the 
left peak in Fig. Hp. 

Figure [3] plots the probability distribution of intra- 
call durations for three randomly chosen individuals 
(3263120, 28012863, and 6466665) whose inter-call du- 
rations follow a Weibull distribution. The solid lines are 
the best maximum likelihood estimation (MLE) fits to 
the Weibull distribution and the corresponding Weibull 
exponents are f3 = 0.54, (3 = 0.64, and j3 = 0.51 for in- 
dividuals 3263120, 28012863, and 6466665, respectively. 
Having the Weibull exponents for all individuals from 
the Weibull group, we calculate the mean value of the 
Weibull exponents (/?) = 0.64 ± 0.12. Figure E]H shows 
the distribution of the Weibull exponents (3. For sake of 
comparison, we also present a normal distribution with 
the parameters obtained by MLE fits on the sample of 
Weibull exponents /?. The overlapping between the em- 
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pirical data and the normal distribution indicates that 
the exponent f3 follows the normal distribution. 

Figure[3]C shows the distribution of inter-call durations 
for the Weibull group at the aggregate level. Note that 
the functional forms of the distribution in Fig. |3]C and 
the empirical distributions in Fig. [TJ4 are similar, sug- 
gesting that the distributions of the aggregate samples 
are dominated by individuals with Weibull duration dis- 
tributions. Figure [3]D plots the probability distribution 
of the mean inter-call durations for the individuals in the 
Weibull group. The peak is in good agreement with the 
right peak in Fig. [T]D. 
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FIG. 4. Analysis of the remaining individuals. (A) Proba- 
bility distribution of inter-call durations for three individu- 
als, whose durations approach to power-law behaviors with- 
out passing the statistical tests. (B) Probability distribution 
of inter-call durations for three individuals, whose duration 
distributions are like Weibull shape but not confirmed by the 
statistical tests. (C) Plots of duration distributions for three 
individuals, whose distribution shapes are uncommon. (D) 
Probability distribution of collective inter-call durations by 
aggregating the durations of different individuals from the 
Weibull group as one sample. The solid line is the best fit to 
the data by means of the least-square method, which gives an 
estimation of power-law exponent 7 = 0.879. (E) Probability 
distribution of the mean inter-call durations for the remaining 
individuals. 

We next investigate the distribution of inter-call dura- 
tions for the remaining 23,197 individuals. We find that 
for a small fraction of individuals (close to 2%), the inter- 
call durations follow a power law, as shown in Fig. 
Because our statistical tests reject the null hypothesis 
that individuals follow approximately a power law, these 
individuals are excluded from the power-law group. We 
find that more than 97% of the individuals have Weibull 
tail distributions, as shown in Fig. |4]S. However the fact 
that the fitting range is lower than 1.5 orders of magni- 
tude (83% of the individuals are in the range of [1, 1.5]) 
disallows these individuals from being classified in the 
Weibull group. Figure |4]C shows a very small number 
of individuals whose inter-call durations cannot be de- 
scribed by either power law or Weibull distributions. Be- 
cause most of the individuals have Weibull-tail distribu- 



tions, the distributions of aggregate inter-call durations 
and the mean inter-call durations exhibit patterns very 
similar to the results obtained from the Weibull group 
[see Figs. HO and HE and Figs. E]C andEP]. 

Calling patterns for power-law and Weibull 
groups. Using three measurements, we quantitatively 
distinguish the calling patterns of the individuals belong- 
ing to two different classified groups. 

(i) The out-degree ki describes the number of different 
calicos for a specified cellphone user. 

(ii) The percentage of outgoing calls r out , is defined by 
dividing the number of outgoing calls by the to- 
tal number of calls — note that the number sending 
spams (junk message pusher) is characterized by 

'Vint. = I- 



(iii) 



The communication diversity </>. Motivated by the 
social diversity proposed in Ref. [22j , we define the 
communication diversity fa, as a function of Shan- 
non entropy to quantify how the cellphone users 
split the number of calls to their friends, 



log(fc 4 ) 



(1) 



Here ki is the out-degree and Pij is the probability 

defined as p^j — n % j jn\ — n/ / Y^j n 7 \ where rij is 
the number of outgoing calls from individual i to 
individual j and n\ is the total number of outgoing 
calls for individual i. A higher fa value indicates 
that the caller's outgoing calls are split more evenly 
to his friends and a smaller fa value implies that 
most of the caller's outgoing calls are to only one 
of his friends. Note that we define fa = when 

h = 1. 

In order to distinguish between the calling patterns 
of the power-law group of Fig. [2] and the Weibull group 
of Fig. [H in Fig. [5] we plot the distribution of the per- 
centage of outgoing calls r out and the distribution of the 
communication diversity <fi. Figures [5]/l and[5]C compare 
strikingly different patterns: (i) in the power-law group, 
the probability p(r ou t) is a monotonically increasing func- 
tion of r out that reaches a maximum value at r out = 1 
(the characteristic value for spam), but in the Weibull 
group, the frequency p(r out ) is a non-monotonic function 
°f r out that has its maximum value close to the center at 
Tout = 0.56, and (ii) in the power-law group, the prob- 
ability p{fa) exhibits three pronounced peaks at <f> = 0, 
4> = 0.84, and = 1, but in the Weibull group, the prob- 
ability p{fa) has only one peak at (j) = 0.82. We further 
estimate the average value of the percentage of outgoing 
calls (r out ) = 0.89 ± 0.13 for the power-law group and 
(''out) = 0.57 ± 0.11 for the Weibull group. Our analy- 
sis indicates that the individuals in the power-law group 
exhibit more extreme calling behaviors than those in the 
Weibull group, e.g., highly- frequent call initiation, a high 
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percentage of outgoing calls, and either all calls to only 
one callee or equally distributing calls among all callees. 

Figures [5]B and [5]D plot the out-degree k with re- 
spect to the communication diversity <j> and thus pro- 
vide additional evidence that the behavior of individuals 
in the power-law group differs greatly from the behav- 
ior of individuals in the Wcibull group. The individ- 
uals in the power-law group form three clusters in the 
((f), k) plane, which arc highlighted by the three ellipses 
in panel (B). The three clusters are also consistent with 
the three peaks of f(<fi) in Fig. [5]A- Figure [5]D, on the 
other hand, shows only one large cluster for the Weibull 
group. Taking the two panels together, we see that the 
communication diversity <p increases with the out-degree 
k on average. In the power-law group we further assign 
the individuals with <f> < 0.1 to cluster 1, the individu- 
als with 0.7 < (f> < 0.9 and 50 < k < 200 to cluster 2, 
and the individuals with <f> > 0.9 and k > 700 to cluster 
3. We find that there are 762, 710, and 1369 individ- 
uals, respectively, with average degrees of 21.76, 114.98, 
and 2083.3, respectively, in which the mean percentage of 
outgoing calls is 0.99, 0.80, and 0.94 in clusters 1, 2, and 
3, respectively. We assign the individuals in the Wcibull 
group to cluster 4 and find that the average degree and 
mean percentage of outgoing calls are 245.13 and 0.57, 
respectively. From our analysis, we first infer that the 
individuals in power-law cluster 1 — the ones character- 
ized by a high frequency of call initiation, a small num- 
ber of callees, or an allocation of almost all outgoing calls 
to only one callee — are robot-based users. We next see 
that the individuals in cluster 3 — the ones characterized 
by high frequency of call initiation, a large number of 
callees, and an even distribution of outgoing calls among 
all callees — are associated with telecom frauds and tele- 
phone sales. We also note that the individuals in cluster 
4 are ordinary cellphone users. We next describe further 
differences in cellphone communication activities among 
the 4 clusters, e.g., the probability that a caller will call 
the c r -th most contact and the burst size probability dur- 
ing burst periods. 

Because most of the calls (mean 99.5% and min 94%) 
made by individuals in cluster 1 are to only one con- 
tact, we now calculate the probability that individuals 
belonging to the other 3 clusters will only call the c r - 
th most contact. In order to rule out the influence of 
newly entering cellphone users, we take into account only 
those individuals listed in the data on the starting date 
of 28 June 2012. Figure [5] shows the average calling fre- 
quency /(c r ) of the c r -th most contact friends for the 
individuals with the same degree in cluster 2. There is 
a linear relationship between /(c r ) and lnc r in panel A, 
which indicates an exponential distribution in the num- 
ber of outgoing calls to different contacts (23|. We see 
that the slope obtained between f(c r ) and lnc r increases 
as the out-degree k increases, but the lack of individu- 
als prevents us from finding the functional form between 
the slopes and the out-degree values k. We also observe 
power-law behavior between /(c r ) and c r in cluster 3 of 
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FIG. 5. Calling patterns for the individuals from power-law 
and Weibull group. (A) Distribution of the percentage of out- 
going calls r out and the call diversity <j> for power law group. 
(B) Plots of out-degree k with respect to communication di- 
versity <j) for power law group. Three ellipses correspond to 
the three clusters of individuals. (C) Similar as (^4) but for 
Weibull group. (D) Similar as (B) but for Weibull group. 



panel (B). The least-square linear fits provide an esti- 
mate for power-law exponent 0.52 and also show that 
the behavior of the power-law exponent is not affected 
by the out-degree k. Figure E]C plots f(c r ) b versus lnc r 
for cluster 4, where b is associated with the maximum 
correlation coefficient of least-square linear fits to f(c r ) b 
versus lnc r by varying b from 0.01 to 0.99 with a step 
of 0.01. The linear relationship between f(c r ) h and lnc r 
suggests that the number of outgoing calls to contacts 
follows a stretched exponential distribution 0, [24[ . In 
panel (D) we show the exponent b plotted with respect 
to the out-degree k, where we observe a striking linear 
relationship: 6 = —4.836 x 10~ 4 fc + 0. 329. Here wc report 
that the probability to call the c r -th most contact is in 
sharp contrast to the results reported in Rcf. [25[ , where 
a Zipf law with a power-law exponent 1.5 is observed 
when, in contrast to our "microscopic" study, individu- 
als are not grouped according to their distributions of 
inter-call durations. 

It was recently proposed that the distribution of burst 
sizes indicates the presence of memory behaviors in the 
timing of consecutive events [l5j ]. where the deviation 
from exponential distributions is a hallmark of correlated 
properties. For a given series of events, a burst period 
is a cluster of consecutive events following their previous 
events within a short time interval At, which is an ar- 
bitrarily assigned value in empirical analysis. The burst 
size e& is defined as the number of events in a burst pe- 
riod. 

Based on our dataset, Fig. [7]shows the probability dis- 
tribution of burst sizes ej, in burst periods at the aggre- 
gate level for the 4 clusters by setting At ~ 100, 300, 
600, and 1000 seconds. In panel A wc find the probabil- 
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FIG. 6. Rank ordering plot showing the average calling fre- 
quency f(c r ) of the c r -th most contacted friend for the users 
with the same degree. (A) Plots of f(c r ) as a function of In c r 
for cluster 2. (B) Loglog plots of f(c r ) with respect to c r for 
cluster 3. (C) Plots of f(c r ) b versus lnc r for cluster 4. (D) 
Scatter plots of b with respect to k for cluster 4. 



ity distributions of eb for cluster 1. Although we find a 
very good power-law relationship between p{eb) and et, 
with an exponent 2.677 for At = 100 seconds, the dis- 
tributions deviate from power-law distributions and tend 
to exponential distributions for At = 300, 600, and 1000 
seconds. Panel (B) shows that the probability distri- 
bution of eb exhibits excellent exponential distributions 
for varying values of At for cluster 2. Panel (C) plots 
the probability distributions of e& for cluster 3. We see 
the power-law behavior of p(eb) with an exponent 2.61 
only when At = 300 seconds. When At = 600 and 1000 
seconds, p(eb) switches from power-law behavior to a bi- 
modal pattern (with exponential tails). Panel (D) shows 
that the probability distributions of eb corresponding to 
different values of At for individuals in cluster 4 all dis- 
play very good power-law behavior, and that the power- 
law decay exponent is 3.6. Comparing our distribution 
with the distribution reported in Fig. 2 A in Ref. [l5[, we 
find that the distribution shapes are very similar for clus- 
ter 4, the only difference being that the extremely large 
brust sizes e& > 500 disappear in the plots for the indi- 
viduals with very long burst sizes assigned into cluster 
1. 

Discussion 

Contrary to common belief, wc find that only 3.46% of 
callers have inter-call durations that follow a power-law 
distribution. The majority of callers (73.34%) have inter- 
call durations that follow a Weibull distribution. Further 
examination reveals that callers with a power-law distri- 
bution exhibit anomalous and extreme calling patterns 
often linked to robot-based calls, telecom frauds, or tele- 
phone sales — information valuable to both academics and 
practitioners, especially mobile telecom providers. Wc 



FIG. 7. Distribution of burst sizes e;, in burst periods. (A) 
Cluster 1 (PDF). (B) Cluster 2 (CDF). (C) Cluster 3 (PDF). 
(D) Cluster 4 (PDF). 



note that Weibull distributions are ubiquitous in such 
routine human activities as intervals for online gamers 
[26| and intertrade intervals in stock trading [27|, HH . 

Although most of the individuals exhibit Weibull dis- 
tributions of the inter-call durations, the distribution at 
the population level is a power law with an exponential 
cutoff, consistent with other works using mobile phone 
communication data from other sources [lol l20j . We ar- 
gue that a superposition of individuals' heterogeneous 
calling behaviors leads to the exponentially truncated 
power-law distribution at the population level, showing 
the importance of different characteristic scales. 

Although individual callers exhibit heterogeneities 
across the entire population and their personal activities 
are also heterogeneous, individual callers can be grouped 
into clusters according to their similarities. The findings 
reported in this paper enable us to construct dynamic 
models at an individual level that agree with empirical 
collective properties. Every reasonable dynamic model 
for cellphone usage should include the major findings of 
this paper, i.e., that individuals are not identical and do 
not exhibit identical behavior. Our strategy is to propose 
models based, not on individuals, but on clusters of in- 
dividuals. Thus to accurately model the trigger process 
in human activity we need a precise classification of in- 
dividuals according to the similarities in their activities, 
and also a detailed investigation of the complete activity 
log for each individual. 

Materials and Methods 

Data description. Our data, which are provided 
by a cellphone provider in China, contain all the call- 
ing records covering two periods. One is from 28 June 
2010 to 24 July 2010 and the other is from 1 October 
2010 to 31 December 2010. For unknown reasons, the 
calling logs for a few hours on certain days (Ocotober 
12, November 5, 6, 13, 21, and 27 and December 6, 8, 
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21, and 22) are missing, and they are excluded from our 
analysis, which results in a total of 109 days. 

For each entry of record, we have the information of 
caller number, callee number, call starting time, call 
length, and call status. The caller and callee number 
is encrypted in order to protect personal privacy. The 
call status indicates whether the call is terminated nor- 
mally. Note that we only take into account normal calls 
that the call begins and ends normally. The calls that are 
not completed or are interrupted, are also discarded. To 
better explain our data, Fig. [5] A shows the call records 
for a given individual subscriber, where a call starts at 
t s and ends at t e . We usually have 

■ • • < if < i? < tt +1 < t? +1 < • • • . (2) 

Further examination is made to check whether tf is less 
than tf +1 for each individual. The records that do not 
obey the equation if < tf +1 can be attributed to the 
recording errors introduced by the system and the i + 1- 
th call record is discarded. 

Definition of intraday inter-call durations. As 

shown in Fig. [5] A, the inter-call duration is defined as 
the time that elapses between two consecutive calls and it 
can be calculated via di = <f — In order to avoid the 
influence on the results of discontinuous recording days, 
which produce very large inter-call durations, we restrict 
the durations to a period of one day (the typical human 
circadian rhythm). Although it might seem obvious to 
separate the days at midnight (00:00 AM), late night calls 
(made by lonely people, lovers, and friends) are common, 
so we divide the days at 4:00 A.M., which is the time 
point associating with the lowest call volume in a 24- 
hour period [see Fig. [8]S]. This allows us to take into 
account the people who go out and stay awake later as 
well. Our restriction is equivalent to excluding inter-call 
durations that span the dividing point (4:00 AM). 

Fitting distributions and statistical tests. A sim- 
ple approach based on maximum likelihood estimation 
(MLE) fits and Kolmogorov-Smirnov (KS) tests is used 
to check whether the candidate distributions (power-law 
or Weibull) can be used to fit the individual intraday 
inter-call durations. Because people are more interested 
in the distribution form of large durations, we assume 
that the durations larger than a truncated value <i m i n are 
described by the candidate distributions, such that 

P (d)~<T 7 , d> drain (3) 

p(d) = aPd' 3 - 1 exp(-ad /3 ), d > d min . (4) 



We also determine the lowest boundary d m i n as an ad- 
ditional parameter. Once <i m i n is obtained, the distribu- 
tion parameters can be estimated by means of MLE fits 
to the left-truncated candidate distribution. Hence, the 
accuracy of estimated d m ; n plays an important role in es- 
timating accurate distribution parameters. Inspired by 
the method proposed in Ref. [21(, the best d m i n is asso- 
ciated with the truncated sample with the smallest KS 
value. The truncated sample is obtained by discarding 
the durations below c? m i n in the original duration sample. 
After the lowest d m i n and the corresponding distribution 
parameters are obtained, we use the KS test and CvM 
test to check the fitting. The null hypothesis Hq for our 
KS test and CvM test is that the data (d > d m i n ) are 
drawn from the candidate distribution (power-law distri- 
bution or Weibull distribution). 

Determining the distribution form. The sample 
of individual intraday inter-call durations, which we as- 
sume conforms to a power-law distribution, must (i) pass 
either of the two tests at the significant level 0.01 and (ii) 
exhibit a fitting range of not less than 1.5 orders of mag- 
nitude. For Weibull distributions, in addition to the two 
above conditions, the Weibull exponent f3 of the intraday 
duration sample must be in the range (0, 1). Because a 
power-law distribution is a two-parameter model and a 
Weibull distribution is a three-parameter model, we first 
filter out the individuals with durations that follow a 
power-law distribution and than inject the remaining in- 
dividuals into the Weibull filtering procedure. 




5 10 15 20 25 

t\ t\ f i+1 f i+1 f/hour 

FIG. 8. Definition of intraday inter-call durations. (A) 
Schematic chart of call logs for an individual. (B) Intraday 
pattern of the number of calls. 
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